Page 1 of 9

2022 Conference article Open Access

How routing strategies impact urban emissions
Cornacchia G., Bohm M., Mauro G., Nanni M., Pedreschi D., Pappalardo L.
Navigation apps use routing algorithms to suggest the best path to reach a user's desired destination. Although undoubtedly useful, navigation apps' impact on the urban environment (e.g., CO2 emissions and pollution) is still largely unclear. In this work, we design a simulation framework to assess the impact of routing algorithms on carbon dioxide emissions within an urban environment. Using APIs from TomTom and OpenStreetMap, we find that settings in which either all vehicles or none of them follow a navigation app's suggestion lead to the worst impact in terms of CO2 emissions. In contrast, when just a portion (around half) of vehicles follow these suggestions, and some degree of randomness is added to the remaining vehicles' paths, we observe a reduction in the overall CO2 emissions over the road network. Our work is a first step towards designing next-generation routing principles that may increase urban well-being while satisfying individual needs.Source: SIGSPATIAL '22 - 30th International Conference on Advances in Geographic Information Systems, Seattle, Washington, 1-4/11/2022
DOI: 10.1145/3557915.3560977
DOI: 10.48550/arxiv.2207.01456
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

2020 Journal article Open Access

Causal inference for social discrimination reasoning
Qureshi B., Kamiran F., Karim A., Ruggieri S., Pedreschi D.
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.Source: Journal of intelligent information systems 54 (2020): 425–437. doi:10.1007/s10844-019-00580-x
DOI: 10.1007/s10844-019-00580-x
DOI: 10.48550/arxiv.1608.03735
Metrics:

2020 Contribution to book Open Access

Explaining multi-label black-box classifiers for health applications
Panigutti C., Guidotti R., Monreale A., Pedreschi D.
Today the state-of-the-art performance in classification is achieved by the so-called âEURoeblack boxesâEUR, i.e. decision-making systems whose internal logic is obscure. Such models could revolutionize the health-care system, however their deployment in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of transparency. The typical classification problem in health-care requires a multi-label approach since the possible labels are not mutually exclusive, e.g. diagnoses. We propose MARLENA, a model-agnostic method which explains multi-label black box decisions. MARLENA explains an individual decision in three steps. First, it generates a synthetic neighborhood around the instance to be explained using a strategy suitable for multi-label decisions. It then learns a decision tree on such neighborhood and finally derives from it a decision rule that explains the black box decision. Our experiments show that MARLENA performs well in terms of mimicking the black box behavior while gaining at the same time a notable amount of interpretability through compact decision rules, i.e. rules with limited length.Source: Precision Health and Medicine. A Digital Revolution in Healthcare, edited by Arash Shaban-Nejad, Martin Michalowski, pp. 97–110, 2020
DOI: 10.1007/978-3-030-24409-5_9
Metrics:

See at: media.springer.com Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2020 Journal article Open Access

Authenticated Outlier Mining for Outsourced Databases
Dong B., Wang H., Monreale A., Pedreschi D., Giannotti F., Guo W.
The Data-Mining-as-a-Service (DMaS) paradigm is becoming the focus of research, as it allows the data owner (client) who lacks expertise and/or computational resources to outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises some issues about result integrity: how could the client verify the mining results returned by the server are both sound and complete? In this paper, we focus on outlier mining, an important mining task. Previous verification techniques use an authenticated data structure (ADS) for correctness authentication, which may incur much space and communication cost. In this paper, we propose a novel solution that returns a probabilistic result integrity guarantee with much cheaper verification cost. The key idea is to insert a set of artificial records (ARs) into the dataset, from which it constructs a set of artificial outliers (AOs) and artificial non-outliers (ANOs). The AOs and ANOs are used by the client to detect any incomplete and/or incorrect mining results with a probabilistic guarantee. The main challenge that we address is how to construct ARs so that they do not change the (non-)outlierness of original records, while guaranteeing that the client can identify ANOs and AOs without executing mining. Furthermore, we build a strategic game and show that a Nash equilibrium exists only when the server returns correct outliers. Our implementation and experiments demonstrate that our verification solution is efficient and lightweight.Source: IEEE transactions on dependable and secure computing 17 (2020): 222–235. doi:10.1109/TDSC.2017.2754493
DOI: 10.1109/tdsc.2017.2754493
Project(s): CAREER: Verifiable Outsourcing of Data Mining Computations via OpenAIRE

, SaTC-EDU: EAGER: Development and Evaluation of Privacy Education Tools via Open Collaboration via OpenAIRE

Metrics:

See at: IEEE Transactions on Dependable and Secure Computing Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA

2020 Journal article Open Access

(So) Big Data and the transformation of the city
Andrienko G., Andrienko N., Boldrini C., Caldarelli G., Cintia P., Cresci S., Facchini A., Giannotti F., Gionis A., Guidotti R., Mathioudakis M., Muntean C. I., Pappalardo L., Pedreschi D., Pournaras E., Pratesi F., Tesconi M., Trasarti R.
The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the "City of Citizens" thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.Source: International Journal of Data Science and Analytics (Print) 1 (2020). doi:10.1007/s41060-020-00207-3
DOI: 10.1007/s41060-020-00207-3
Project(s): SoBigData via OpenAIRE

Metrics:

2020 Report Open Access

Predicting seasonal influenza using supermarket retail records
Miliou I., Xiong X., Rinzivillo S., Zhang Q., Rossetti G., Giannotti F., Pedreschi D., Vespignani A.
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.Source: ISTI Technical Reports 2020/009, 2020
DOI: 10.32079/isti-tr-2020/009
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2019 Journal article Open Access

Personalized market basket prediction with temporal annotated recurring sequences
Guidotti R., Rossetti G., Pappalardo L., Giannotti F., Pedreschi D.
Nowadays, a hot challenge for supermarket chains is to offer personalized services to their customers. Market basket prediction, i.e., supplying the customer a shopping list for the next purchase according to her current needs, is one of these services. Current approaches are not capable of capturing at the same time the different factors influencing the customer's decision process: co-occurrence, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shopping sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors.Source: IEEE transactions on knowledge and data engineering (Print) 31 (2019): 2151–2163. doi:10.1109/TKDE.2018.2872587
DOI: 10.1109/tkde.2018.2872587
Project(s): SoBigData via OpenAIRE

Metrics:

2019 Journal article Open Access

A survey of methods for explaining black box models
Guidotti R., Monreale A., Ruggieri S., Turini F., Giannotti F., Pedreschi D.
In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Source: ACM computing surveys 51 (2019). doi:10.1145/3236009
DOI: 10.1145/3236009
DOI: 10.48550/arxiv.1802.01933
Project(s): SoBigData via OpenAIRE

Metrics:

2019 Journal article Open Access

The italian music superdiversity. Geography, emotion and language: one resource to find them, one resource to rule them all
Pollacci L., Guidotti R., Rossetti G., Giannotti F., Pedreschi D.
Globalization can lead to a growing standardization of musical contents. Using a cross-service multi-level dataset we investigate the actual Italian music scene. The investigation highlights the musical Italian superdiversity both individually analyzing the geographical and lexical dimensions and combining them. Using different kinds of features over the geographical dimension leads to two similar, comparable and coherent results, confirming the strong and essential correlation between melodies and lyrics. The profiles identified are markedly distinct one from another with respect to sentiment, lexicon, and melodic features. Through a novel application of a sentiment spreading algorithm and songs' melodic features, we are able to highlight discriminant characteristics that violate the standard regional political boundaries, reconfiguring them following the actual musical communicative practices.Source: Multimedia tools and applications (Dordrecht. Online) 78 (2019): 3297–3319. doi:10.1007/s11042-018-6511-6
DOI: 10.1007/s11042-018-6511-6
Project(s): SoBigData via OpenAIRE

Metrics:

2019 Contribution to book Open Access

Analysis and visualization of performance indicators in university admission tests
Natilli M., Fadda D., Rinzivillo S., Pedreschi D., Licari F.
This paper presents an analytical platform for evaluation of the performance and anomaly detection of tests for admission to public universities in Italy. Each test is personalized for each student and is composed of a series of questions, classified on different domains (e.g. maths, science, logic, etc.). Since each test is unique for composition, it is crucial to guarantee a similar level of difficulty for all the tests in a session. For this reason, to each question, it is assigned a level of difficulty from a domain expert. Thus, the general difficultness of a test depends on the correct classification of each item. We propose two approaches to detect outliers. A visualization-based approach using dynamic filter and responsive visual widgets. A data mining approach to evaluate the performance of the different questions for five years. We used clustering to group the questions according to a set of performance indicators to provide labeling of the data-driven level of difficulty. The measured level is compared with the a priori assigned by experts. The misclassifications are then highlighted to the expert, who will be able to refine the ques- tion or the classification. Sequential pattern mining is used to check if biases are present in the composition of the tests and their performance. This analysis is meant to exclude overlaps or direct dependencies among questions. Analyzing co-occurrences we are able to state that the compo- sition of each test is fair and uniform for all the students, even on several sessions. The analytical results are presented to the expert through a visual web application that loads the analytical data and indicators and composes an interactive dashboard. The user may explore the patterns and models extracted by filtering and changing thresholds and analytical parameters.Source: Formal Methods. FM 2019 International Workshops, edited by Emil Sekerinski et al..., pp. 186–199, 2019
DOI: 10.1007/978-3-030-54994-7_14
Metrics:

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2019 Conference article Open Access

A visual analytics platform to measure performance on university entrance tests
Boncoraglio D., Deri F., Distefano F., Fadda D., Filippi G., Forte G., Licari F., Natilli M., Pedreschi D., Rinzivillo S.
Data visualization dashboards provide an efficient approach that helps to improve the ability to understand the information behind complex databases. It is possible with such tools to create new insights, to represent keys indicators of the activity, to communicate (in real-time) snapshots of the state of the work. In this paper, we present a visual analytics platform created for the exploration and analysis of performance data on entrance tests taken by Italian students when entering the university career. The data is provided by CISIA (Consorzio Interuniversitario Sistemi Integrati per l'Accesso), a non-profit consortium formed exclusively by public universities. With this platform, it is possible to explore the performance of the students along different dimensions, such as gender, high school of provenience, type of test and so on.Source: 27th Italian Symposium on Advanced Database Systems, Castiglione della Pescaia, Grosseto, Italy (Grosseto), Italy, 16-19 June 2019

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2019 Journal article Open Access

The AI black box explanation problem
Guidotti R., Monreale A., Pedreschi D.
Explainable AI is an essential component of a "Human AI", i.e., an AI that expands human experience, instead of replacing it. It will be impossible to gain the trust of people in AI tools that make crucial decisions in an opaque way without explaining the rationale followed, especially in areas where we do not want to completely delegate decisions to machines.Source: ERCIM news (2019): 12–13.
Project(s): SoBigData via OpenAIRE

See at: ercim-news.ercim.eu Open Access | ISTI Repository | CNR ExploRA

2018 Contribution to book Open Access

How data mining and machine learning evolved from relational data base to data science
Amato G., Candela L., Castelli D., Esuli A., Falchi F., Gennaro C., Giannotti F., Monreale A., Nanni M., Pagano P., Pappalardo L., Pedreschi D., Pratesi F., Rabitti F., Rinzivillo S., Rossetti G., Ruggieri S., Sebastiani F., Tesconi M.
During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.Source: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, edited by Sergio Flesca, Sergio Greco, Elio Masciari, Domenico Saccà, pp. 287–306, 2018
DOI: 10.1007/978-3-319-61893-7_17
Metrics:

See at: arpi.unipi.it Open Access | ISTI Repository | doi.org Restricted | link.springer.com | CNR ExploRA

2018 Journal article Open Access

NDlib: a python library to model and analyze diffusion processes over complex networks
Rossetti G., Milli L., Rinzivillo S., Sirbu A., Giannotti F., Pedreschi D.
Nowadays the analysis of dynamics of and on networks represents a hot topic in the social network analysis playground. To support students, teachers, developers and researchers, in this work we introduce a novel framework, namely NDlib, an environment designed to describe diffusion simulations. NDlib is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. For this reason, upon NDlib, we designed a simulation server that allows remote execution of experiments as well as an online visualization tool that abstracts its programmatic interface and makes available the simulation platform to non-technicians.Source: International Journal of Data Science and Analytics (Online) 5 (2018): 61–79. doi:10.1007/s41060-017-0086-6
DOI: 10.1007/s41060-017-0086-6
DOI: 10.48550/arxiv.1801.05854
Project(s): CIMPLEX via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

2018 Conference article Open Access

Discovering Mobility Functional Areas: A Mobility Data Analysis Approach
Gabrielli L., Fadda D., Rossetti G., Nanni M., Piccinini L., Pedreschi D., Giannotti F., Lattarulo P.
How do we measure the borders of urban areas and therefore decide which are the functional units of the territory? Nowadays, we typically do that just looking at census data, while in this work we aim to identify functional areas for mobility in a completely data-driven way. Our solution makes use of human mobility data (vehicle trajectories) and consists in an agglomerative process which gradually groups together those municipalities that maximize internal vehicular traffic while minimizing external one. The approach is tested against a dataset of trips involving individuals of an Italian Region, obtaining a new territorial division which allows us to identify mobility attractors. Leveraging such partitioning and external knowledge, we show that our method outperforms the state-of-the-art algorithms. Indeed, the outcome of our approach is of great value to public administrations for creating synergies within the aggregations of the territories obtained.Source: 9th Conference on Complex Networks, CompleNet, pp. 311–322, Boston, 6/03/2018
DOI: 10.1007/978-3-319-73198-8_27
Project(s): SoBigData via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ISTI Repository | Springer Proceedings in Complexity Restricted | link.springer.com | CNR ExploRA

2018 Conference article Open Access

Diffusive Phenomena in Dynamic Networks: a data-driven study
Milli L., Rossetti G., Pedreschi D., Giannotti F.
Everyday, ideas, information as well as viruses spread over complex social tissues described by our interpersonal relations. So far, the network contexts upon which diffusive phenomena unfold have usually been considered static, composed by a fixed set of nodes and edges. Recent studies describe social networks as rapidly changing topologies. In this work -- following a data-driven approach -- we compare the behaviors of classical spreading models when used to analyze a given social network whose topological dynamics are observed at different temporal granularities. Our goal is to shed some light on the impacts that the adoption of a static topology has on spreading simulations as well as to provide an alternative formulation of two classical diffusion models.Source: 9th Conference on Complex Networks, CompleNet, pp. 151–159, Boston, USA, 6/3/2018
DOI: 10.1007/978-3-319-73198-8_13
Project(s): CIMPLEX via OpenAIRE

, SoBigData via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | Springer Proceedings in Complexity Restricted | link.springer.com | CNR ExploRA

2018 Conference article Open Access

The fractal dimension of music: geography, popularity and sentiment analysis
Pollacci L., Rossetti G., Guidotti R., Giannotti F., Pedreschi D.
Nowadays there is a growing standardization of musical con- tents. Our finding comes out from a cross-service multi-level dataset analysis where we study how geography affects the music production. The investigation presented in this paper highlights the existence of a "fractal" musical structure that relates the technical characteristics of the music produced at regional, national and world level. Moreover, a similar structure emerges also when we analyze the musicians' popular- ity and the polarity of their songs defined as the mood that they are able to convey. Furthermore, the clusters identified are markedly distinct one from another with respect to popularity and sentiment.Source: GOODTECHS 2017 - Third International Conference on Smart Objects and Technologies for Social Good, pp. 183–194, Pisa, Italy, 29-30 November 2017
DOI: 10.1007/978-3-319-76111-4_19
Project(s): SoBigData via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Restricted | link.springer.com | CNR ExploRA

2018 Journal article Open Access

Discovering temporal regularities in retail customers' shopping behavior
Guidotti R., Gabrielli L., Monreale A., Pedreschi D., Giannotti F.
In this paper we investigate the regularities characterizing the temporal purchasing behavior of the customers of a retail market chain. Most of the literature studying purchasing behavior focuses on what customers buy while giving few importance to the temporal dimension. As a consequence, the state of the art does not allow capturing which are the temporal purchasing patterns of each customers. These patterns should describe the customer's temporal habits highlighting when she typically makes a purchase in correlation with information about the amount of expenditure, number of purchased items and other similar aggregates. This knowledge could be exploited for different scopes: set temporal discounts for making the purchases of customers more regular with respect the time, set personalized discounts in the day and time window preferred by the customer, provide recommendations for shopping time schedule, etc. To this aim, we introduce a framework for extracting from personal retail data a temporal purchasing profile able to summarize whether and when a customer makes her distinctive purchases. The individual profile describes a set of regular and characterizing shopping behavioral patterns, and the sequences in which these patterns take place. We show how to compare different customers by providing a collective perspective to their individual profiles, and how to group the customers with respect to these comparable profiles. By analyzing real datasets containing millions of shopping sessions we found that there is a limited number of patterns summarizing the temporal purchasing behavior of all the customers, and that they are sequentially followed in a finite number of ways. Moreover, we recognized regular customers characterized by a small number of temporal purchasing behaviors, and changing customers characterized by various types of temporal purchasing behaviors. Finally, we discuss on how the profiles can be exploited both by customers to enable personalized services, and by the retail market chain for providing tailored discounts based on temporal purchasing regularity.Source: EPJ 7 (2018): 6. doi:10.1140/epjds/s13688-018-0133-0
DOI: 10.1140/epjds/s13688-018-0133-0
Project(s): SoBigData via OpenAIRE

Metrics:

2018 Report Open Access

Local rule-based explanations of black box decision systems
Guidotti R., Monreale A., Ruggieri S., Pedreschi D., Turini F., Giannotti F.
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts.% Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, ie, explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository | CNR ExploRA

2018 Report Open Access

Open the black box data-driven explanation of black box decision systems
Pedreschi D., Giannotti F., Guidotti R., Monreale A., Pappalardo L., Ruggieri S., Turini F.
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions:(i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation;(ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance;(iii), the bottom-up generalization of the many local explanations into simple global ones, with algorithms that optimize the quality and comprehensibility of explanations.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository | CNR ExploRA